Use
mutateto add new variables or modify the existing ones.
For example, the pulse dataset has two pulse measurements, let’s say we are interested in average pulse and we want this information to be available as a separate variable, e.g. averagePulse, in the pulse tibble. Then we can do this with:
mutate(pulse, averagePulse = (pulse1+pulse2)/2)
# A tibble: 110 x 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year averagePulse
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 87
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 116
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 136
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 72
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 89
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 110.
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 70
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 74
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 68
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 119
# … with 100 more rows
By default the new column is added at the last position in the tibble.
AnswerDoes the pulse tibble now contain the variable
averagePulse?
No, if you want to keep the new variable averagePulse you’ll need to use assignment ‘<-’ to replace the original pulse tibble with the newly modified version:
pulse <- mutate(pulse, averagePulse = (pulse1+pulse2)/2)
Take as another example the variable BMI: \[BMI=\frac{weight_{kg}}{{height_m}^2}\]
Note that BMI definition states that weight and height must be in kilograms and metres respectively. In the pulse dataset weight is given in kilograms but height is in centimetres. We can now first create a new variable height_metre containing the height in metres and then calculate BMI:
pulse_bmi <- mutate(pulse, height_metre=height/100) # convert centimetres to metre
pulse_bmi
# A tibble: 110 x 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year height_metre
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 1.73
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 1.79
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 1.67
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 1.95
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 1.73
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 1.84
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 1.62
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 1.69
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 1.64
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 1.68
# … with 100 more rows
pulse_bmi tibble has now the height in metre units, now we can calculate BMI:
pulse_bmi <- mutate(pulse_bmi, BMI=weight/(height_metre^2))
pulse_bmi
# A tibble: 110 x 15
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year height_metre BMI
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 1.73 19.0
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 1.79 18.1
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 1.67 22.2
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 1.95 22.1
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 1.73 21.4
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 1.84 21.9
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 1.62 21.7
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 1.69 19.3
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 1.64 20.8
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 1.68 21.3
# … with 100 more rows
Alternatively, you may skip the creation of height_metre and calculate BMI directly from the pulse tibble:
pulse_bmi <- mutate(pulse, BMI=weight/((height/100)^2))
pulse_bmi
# A tibble: 110 x 14
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year BMI
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 18 female no yes moderate sat 86 88 1993 19.0
2 1993_B Melanie 179 58 19 female no yes moderate ran 82 150 1993 18.1
3 1993_C Consuelo 167 62 18 female no yes high ran 96 176 1993 22.2
4 1993_D Travis 195 84 18 male no yes high sat 71 73 1993 22.1
5 1993_E Lauri 173 64 18 female no yes low sat 90 88 1993 21.4
6 1993_F George 184 74 22 male no yes low ran 78 141 1993 21.9
7 1993_G Cherry 162 57 20 female no yes moderate sat 68 72 1993 21.7
8 1993_H Francesca 169 55 18 female no yes moderate sat 71 77 1993 19.3
9 1993_I Sonja 164 56 19 female no yes high sat 68 68 1993 20.8
10 1993_J Troy 168 60 23 male no yes moderate ran 88 150 1993 21.3
# … with 100 more rows
In the examples above we added a new variable to our dataset, but you can also update an existing variable. For example, let’s say we want to have the age expressed (roughly) in days instead of years:
mutate(pulse, age=age*365)
# A tibble: 110 x 13
id name height weight age gender smokes alcohol exercise ran pulse1 pulse2 year
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 1993_A Bonnie 173 57 6570 female no yes moderate sat 86 88 1993
2 1993_B Melanie 179 58 6935 female no yes moderate ran 82 150 1993
3 1993_C Consuelo 167 62 6570 female no yes high ran 96 176 1993
4 1993_D Travis 195 84 6570 male no yes high sat 71 73 1993
5 1993_E Lauri 173 64 6570 female no yes low sat 90 88 1993
6 1993_F George 184 74 8030 male no yes low ran 78 141 1993
7 1993_G Cherry 162 57 7300 female no yes moderate sat 68 72 1993
8 1993_H Francesca 169 55 6570 female no yes moderate sat 71 77 1993
9 1993_I Sonja 164 56 6935 female no yes high sat 68 68 1993
10 1993_J Troy 168 60 8395 male no yes moderate ran 88 150 1993
# … with 100 more rows
here we keep the variable age but change its unit from year to days.
Copyright © 2021 Biomedical Data Sciences (BDS) | LUMC